Detection of Speaker Identities from Cochannel Speech Signal

WSEAS Transactions on Signal Processing

Print ISSN: 1790-5052
E-ISSN: 2224-3488

Volume 14, 2018

Notice: As of 2014 and for the forthcoming years, the publication frequency/periodicity of WSEAS Journals is adapted to the 'continuously updated' model. What this means is that instead of being separated into issues, new papers will be added on a continuous basis, allowing a more regular flow and shorter publication times. The papers will appear in reverse order, therefore the most recent one will be on top.

Detection of Speaker Identities from Cochannel Speech Signal

AUTHORS: Pallavi Ingale, Sanjay Nalbalwar

Download as PDF

ABSTRACT: Supervised speech segregation for cochannel speech signal can be made easier if we use predetermined speaker’s models instead of taking models for all the population. Here we propose a signal to signal ratio (SSR) independent method to detect speaker identities from a cochannel speech signal with unique speaker specific features for speaker identification. Proposed Kekre’s Transform Cepstral Coefficient (KTCC) features are the robust acoustic features for speaker identification. A text independent speaker identification system is utilized for identifying speakers in short segments of test signal. Gaussian mixture modeling (GMM) classifier is used for the identification task. We compare the proposed method with a system utilizing conventional features called Mel Frequency Cepstral Coefficient (MFCC) features. Spontaneous speech utterances from candidates are taken for experimentation instead of utterances that follow a command like structure with a unique grammatical structure and have a limited word list in speech separation challenge (SSC) corpus. Identification is performed on short segments of the cochannel mixture. Two Speakers who have been identified for most of segments of the cochannel mixture are selected as two speakers detected for the same cochannel mixture. Average speaker detection accuracy of 93.56% is achieved in case of two speaker cochannel mixture for of KTCC features. This method produces best results for cochannel speaker identification even being text independent. Speaker identification performance is also checked for various test segment lengths. KTCC features outperform in speaker identification task even the length of speech segment is very short.

KEYWORDS: Detection of speaker identities, text independent speaker identification, cochannel speech, KTCC

REFERENCES:

[1] W. Yu, L. Jiajun, C. Ning, and Y. Wenhao, Improved monaural speech segregation based on computational auditory scene analysis, EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2013, No.2, 2013, pp. 1-15.

[2] H, Ke, and D. Wang, An iterative model-based approach to cochannel speech separation, EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2013, No.1 2013, pp. 1-11.

[3] Y. Wang, and D. Wang, A structure-preserving training target for supervised speech separation, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 6107-6111.

[4] A. Reddy and B. Raj, Soft mask methods for single-channel speaker separation, IEEE Transactions on Audio, Speech and Language Processing, Vol.15, No.6, 2007, pp. 1766- 1776.

[5] K, Gibak, Y. Lu, Y. Hu, and P. Loizou, An algorithm that improves speech intelligibility in noise for normal-hearing listeners, The Journal of the Acoustical Society of America, Vol. 126, No.3, 2009, pp. 1486-1494.

[6] Y. Shao, S. Srinivasan, Z. Jin, and D. Wang, A computational auditory scene analysis system for speech segregation and robust speech recognition, Computer Speech & Language, Vol.24, No.1, 2010, pp. 77-93.

[7] J. R. Hershey, S. J. Rennie, P. A. Olsen, and T.T. Kristjansson, Super-human multi-talker speech recognition: A graphical modeling approach, Computer Speech & Language, Vol.24, No.1, 2010, pp. 45-66.

[8] P. Mowlaee, R. Saeidi, M. G. Christensen, Z. H. Tan, T. Kinnunen, P. Franti, and S. H. Jensen, A joint approach for single-channel speaker identification and speech separation, IEEE Transactions on Audio, Speech, and Language Processing, Vol.20, No.9, 2012, pp. 2586-2601.

[9] D. A. Reynolds, Speaker identification and verification using Gaussian mixture speaker models, Speech communication, Vol.17, No.1, 1995, pp. 91-108.

[10] M. Cooke, J. R. Hershey, and S. J. Rennie, Monaural speech separation and recognition challenge, Computer Speech & Language, Vol. 24. No.1, 2010, pp. 1-15.

[11] X. Zhao, Y. Wang, and D. Wang, Deep neural networks for cochannel speaker identification, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 4824-4828.

[12] J. M. Naik, Speaker Verification: A Tutorial, IEEE Communications Magazine, Vol. 28. No.1, 1990. pp. 42-28.

[13] H. B. Kekre, S. D. Thepade, and A. Maloo, Performance Comparison of Image Retrieval Using Fractional Coefficients of Transformed Image Using DCT, Walsh, Haar and Kekre’s Transform, CSC-International Journal of Image processing (IJIP), Vol.4, No.2, 2010, pp. 142-155.

[14] D. A. Reynolds, and R. C. Rose, Robust textindependent speaker identification using Gaussian mixture speaker models, IEEE transactions on speech and audio processing, Vol.3, No.1, 1995, pp. 72-83.

[15] T. Giannakopoulos, A. Pikrakis, Introduction to Audio Analysis: A MATLAB® Approach. Academic Press, 2014.

WSEAS Transactions on Signal Processing, ISSN / E-ISSN: 1790-5052 / 2224-3488, Volume 14, 2018, Art. #6, pp. 43-49

Copyright © 2018 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution License 4.0

Quick Links

Login

Other Articles by Author(s)

Author(s) and WSEAS

WSEAS Transactions on Signal Processing

Bulletin Board